{ "cells": [ { "cell_type": "code", "execution_count": null, "source": [ "import os\n", "import torch\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "import alonet\n", "\n", "# for DETR\n", "from alonet.detr import Detr, DetrR50, DetrR50Finetune, LitDetr\n", "from alonet.detr.trt_exporter import DetrTRTExporter\n", "# for Deformable DETR\n", "from alonet.deformable_detr import (\n", " DeformableDETR, \n", " DeformableDetrR50Refinement, \n", " DeformableDetrR50RefinementFinetune)\n", "from alonet.deformable_detr.trt_exporter import DeformableDetrTRTExporter\n", "\n", "from alonet.common import pl_helpers\n", "from alonet.torch2trt import TRTExecutor\n", "\n", "import aloscene\n", "from aloscene import Frame" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "In this tutorial, we will convert DETR in TensorRT in order to reduce memory footprint and inference time.\n", "\n", "The model weight can be loaded either from a .pth file or from a run_id using `Aloception` API.\n", "\n", "This notebook might crash if there is not enough GPU memory. In this case you can reduce the image size or run only cells from either **Load from .pth checkpoint** or **Inference with TensorRT** or **Load weight from run_id**.\n", "\n", "The workflow is:\\\n", "1.Load model and trained weights\\\n", "2.Instantiate corresponding TensorRT exporter \\\n", "3.Run the export\\\n", "4.Grab a cup of coffee or take some air while waiting :)\n", "\n", "Now, let's define some constant what we will use throughout this tutorial." ], "metadata": {} }, { "cell_type": "code", "execution_count": 1, "source": [ "INPUT_SHAPE = [3, 1280, 1920] # [C, H, W], change input shape if needed\n", "BATCH_SIZE = 1\n", "PRECISION = \"fp16\" # or \"fp32\"" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "The input dimension is [B, C, H, W] of which [C, H, W] is defined by `INPUT_SHAPE` and B is defined by `BATCH_SIZE`.\n", "\n", "`PRECISION` defines the precision of model weights. It is either \"fp32\" or \"fp16\" for float32 and float16 respectively.\n", "\n", "We will run inference in a test image for qualitative comparison between PyTorch and TensorRT" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "image_path = \"PATH/TO/IMAGE\"\n", "img = Frame(image_path)\n", "frame = img.resize(INPUT_SHAPE[1:]).norm_resnet()\n", "frame = Frame.batch_list([frame])\n", "img.get_view().render()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "# DETR" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Load from .pth checkpoint" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "In this example, we use weight trained DETR-R50 on COCO from [official repository DETR](https://github.com/facebookresearch/detr) but the workflow is valid for any finetuned model with its associated .pth file." ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# 1. Instantiate model and load trained weight\n", "weight_path = \"PATH/TO/CHECKPOINT.pth\"\n", "num_classes = background_class = 91 # COCO classes\n", "\n", "torch_model = DetrR50(\n", " num_classes=num_classes,\n", " aux_loss=False, # we don't want auxilary outputs\n", " return_dec_outputs=False, # we don't want decoder outputs\n", ")\n", "torch_model.eval()\n", "alonet.common.load_weights(torch_model, weight_path, torch_model.device)" ], "outputs": [], "metadata": { "scrolled": false } }, { "cell_type": "code", "execution_count": null, "source": [ "# 2. Instantiate corresponding exporter\n", "\n", "model_name = \"\".join(os.path.basename(weight_path).split(\".\")[:-1])\n", "# Because the exporter will use ONNX format as an intermediate bridge \n", "# between PyTorch and TensorRT, we need to specify a path where the ONNX file will be save.\n", "onnx_path = os.path.join(os.path.dirname(weight_path), model_name + \".onnx\")\n", "\n", "exporter = DetrTRTExporter(\n", " model=torch_model,\n", " onnx_path=onnx_path,\n", " input_shapes=(INPUT_SHAPE,),\n", " input_names=[\"img\"], \n", " batch_size=BATCH_SIZE,\n", " precision=PRECISION,\n", ")" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# 3. Run the exporter\n", "exporter.export_engine()\n", "engine_path = exporter.engine_path" ], "outputs": [], "metadata": { "scrolled": false } }, { "cell_type": "markdown", "source": [ "After the export, 2 files will be created in the root directory containing the checkpoint file.\n", "\n", "ROOT_DIR\\\n", "|__ MODEL.pth\\\n", "|__ MODEL.onnx\\\n", "|__ MODEL_PRECISION.engine\n", "\n", "The .onnx file is a ONNX graph which serves as intermediate bridge between PyTorch and TensorRT. The .engine file is the model serialized as TensorRT engine. For deployment and inference, .engine file will be deserialized and executed by TensorRT." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "### Inference with TensorRT" ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "class DetrInference():\n", " def __init__(self, background_class=91) -> None:\n", " self.background_class = background_class\n", "\n", " def get_outs_filter(self, *args, **kwargs):\n", " return Detr.get_outs_filter(self, *args, **kwargs)\n", "\n", " def __call__(self, forward_out, **kwargs):\n", " forward_out = {key: torch.tensor(forward_out[key]) for key in forward_out}\n", " return Detr.inference(self, forward_out, **kwargs)" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "In other to benefit the inference logic implemented in alonet DETR without instantiating the whole model in PyTorch, we create a helper class which calls `Detr.inference` method." ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "trt_model = TRTExecutor(engine_path)\n", "trt_model.print_bindings_info()" ], "outputs": [], "metadata": { "scrolled": true } }, { "cell_type": "markdown", "source": [ "The input `img` shape is (B, C, H, W) with C=4 because we concatenate RGB image (B, 3, H, W) and its mask of shape (B, 1, H, W) containing 1 on padded pixels." ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "m_input = np.concatenate([frame.as_tensor(), frame.mask.as_tensor()], axis=1, dtype=np.float32)\n", "\n", "trt_m_outputs = trt_model(m_input)\n", "trt_pred_boxes = DetrInference(background_class=background_class)(trt_m_outputs)\n", "\n", "# visualize the result\n", "trt_pred_boxes[0].get_view(frame=img).render()" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# compare with the result from model in PyTorch\n", "with torch.no_grad():\n", " torch_m_outputs = torch_model(frame)\n", "torch_pred_boxes = torch_model.inference(torch_m_outputs)\n", "torch_pred_boxes[0].get_view(frame=img).render()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "A quick qualitative comparison show that 2 models give nearly identical results. The minor difference is from the fact that we use the precision float16 for our TensorRT engine which is not the case for the model in PyTorch." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Load weight from run_id" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "After having trained your DETR model using aloception API, we can load the model from a run_id and export it to TensorRT using the same workflow." ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# Define the train project and the run_id from which we want to load weight\n", "project = \"YOUR_PROJECT_NAME\" \n", "run_id = \"YOUR_RUN_ID\"\n", "model_name = \"MODEL_NAME\" \n", "num_classes = ... # number of classes in your finetune model" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# 1. Instantiate the model and load weight from run_id\n", "torch_model = DetrR50Finetune(\n", " num_classes=num_classes,\n", " aux_loss=False, # we don't want auxilary outputs\n", " return_dec_outputs=False, # we don't want decoder outputs\n", ")\n", "\n", "lit_model = pl_helpers.load_training(\n", " LitDetr, # The PyTorch Lightning Module that was used in training\n", " project_run_id=project, \n", " run_id=run_id, \n", " model=torch_model\n", ")\n", "\n", "torch_model = lit_model.model.eval()" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# 2. Instantiate the exporter\n", "# Because the exporter will use ONNX format as an intermediate bridge \n", "# between PyTorch and TensorRT, we need to specify a path where the ONNX file will be save.\n", "project_dir, run_id_dir, _ = pl_helpers.get_expe_infos(project, run_id)\n", "onnx_path = os.path.join(run_id_dir, model_name + \".onnx\")\n", "\n", "exporter = DetrTRTExporter(\n", " model=torch_model,\n", " onnx_path=onnx_path,\n", " input_shapes=(INPUT_SHAPE,),\n", " input_names=[\"img\"], \n", " batch_size=BATCH_SIZE,\n", " precision=PRECISION,\n", ")" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# 3. Run the exporter\n", "exporter.export_engine()\n", "engine_path = exporter.engine_path" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# Test inference\n", "trt_model = TRTExecutor(engine_path)\n", "trt_model.print_bindings_info()" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "class DetrInference():\n", " def __init__(self, background_class=91) -> None:\n", " self.background_class = background_class\n", "\n", " def get_outs_filter(self, *args, **kwargs):\n", " return Detr.get_outs_filter(self, *args, **kwargs)\n", "\n", " def __call__(self, forward_out, **kwargs):\n", " forward_out = {key: torch.tensor(forward_out[key]) for key in forward_out}\n", " return Detr.inference(self, forward_out, **kwargs)" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# Test inference\n", "\n", "m_input = np.concatenate([frame.as_tensor(), frame.mask.as_tensor()], axis=1, dtype=np.float32)\n", "\n", "trt_m_outputs = trt_model(m_input)\n", "trt_pred_boxes = DetrInference(background_class=num_classes)(trt_m_outputs)\n", "\n", "# visualize the result\n", "trt_pred_boxes[0].get_view(frame=img).render()" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [ "# compare with the result from model in PyTorch\n", "with torch.no_grad():\n", " torch_m_outputs = torch_model(frame)\n", "torch_pred_boxes = torch_model.inference(torch_m_outputs)\n", "torch_pred_boxes[0].get_view(frame=img).render()" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "As explained above, the comparison show that 2 models give nearly identical results. The minor difference is from the fact that we use the precision float16 for our TensorRT engine. " ], "metadata": {} } ], "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3.8.8 64-bit ('aloception': conda)" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "interpreter": { "hash": "68220423dd456e0c43a9c8931fec78ccc1b1e66482ae3da0e7135ef14a46e76e" } }, "nbformat": 4, "nbformat_minor": 4 }